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Abstract 



Diversification of an investment into independently fluctuating assets re- 
duces its risk. In reality, movement of assets are are mutually correlated and 
therefore knowledge of cross-correlations among asset price movements are 
of great importance. Our results support the possibility that the problem 
of finding an investment in stocks which exposes invested funds to a mini- 
mum level of risk is analogous to the problem of finding the magnetization 
of a random magnet. The interactions for this "random magnet problem" 
are given by the cross-correlation matrix C of stock returns. We find that 
random matrix theory allows us to make an estimate for C which outperforms 
the standard estimate in terms of constructing an investment which carries a 
minimum level of risk. 
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Challenging optimization problems are encountered in many branches of science. Typical 
examples include the traveling salesman problem [0-0] and the traveling tourist problem [Q . 
Another type of optimization problem occurs when system parameters are not accurately 
known and only estimates are available, such as in the problem of finding the least risky 
investment in the stock market which earns a given return. Such an investment is called 
an optimal portfolio. It has been suggested [f| that the calculation of an optimal portfolio 
has an analogy in pure physics: finding the ground state of a random magnet. However, 
the portfolio optimization problem is more intricate due to the fact that many "system" 
parameters such as correlations are not known with any degree of accuracy, but can only be 
estimated from empirical data. 

Two relevant pieces of information are necessary for an investor to judge the quality of an 
investment: the investor must know (i) the expected relative change in price ("return"), and 
(ii) the uncertainty of the return ("risk"), usually measured by the standard deviation of the 
returns over some preselected time intervals. Given two investments with the same return, 
the investment with smaller risk is preferred. One way to reduce risk is to diversify the 
investment, i.e., to buy stocks of not one, but of N different companies ||. Diversifying the 
investment would work best if the fluctuations of stock prices were completely uncorrelated; 
the risk would then decrease with N as 1/ y/N. In reality, the price fluctuations of different 
stocks are correlated. The challenging optimization problem is to choose the fraction of 
money to be invested into each stock m; where i runs over all N stocks, in such a way as 
to minimize the effect of correlations on risk of the N-stock portfolio. We define the return 
Gi as the relative price change of stock i, i = 1...N, and denote the expected total return 
by R = Y^f=\ m iRi with Ri = (Gi), the return of an investment in company i over the 
investment period (in our empirical study half a year). 

The variance of R is 

N 

D 2 = \(',jCf,Cj)iiiii"j , (1) 

where the cross-correlation matrix C is the covariance matrix normalized by the standard 
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deviations {c^} of individual stocks To study the influence of the cross-correlation 

matrix on investment decisions we consider a straightforward investment problem first, where 
short selling of stocks (i.e. borrowing stocks and selling them) is allowed at no extra cost. 
In addition, we consider a problem where all the capital is invested in stocks. Enforcing the 
constraints of fixed return R and fixed total capital YhLi m % — 1 by Lagrange multipliers [i 
and h, the optimal portfolio is defined as the set {rrii} found by minimizing the function || 



which is equivalent to the free energy of an Ising model with random couplings 0{Oj and 
a random magnetic field From a physics point of view, selecting an optimal portfolio 
amounts to calculating the mean field magnetizations m 8 of this random Ising model with 
the constraint of total magnetization one. An analytical solution exists since the free energy 
is quadratic. The expected return R is a monotonically increasing function of the standard 
deviation D. Thus, for accepting a large standard deviation (risk) the investor is rewarded 
with a high expected return. 

For the calculation of an optimal portfolio, one requires the 2N expectation values for 
future returns and standard deviations of stock returns, and estimates for the N(N— 1)/2 in- 
dependent elements Cy. In practice, returns and standard deviations are estimated by com- 
bining historical values with the judgement of analysts flOf . In contrast, cross-correlations 
are estimated purely from historical time series as analysts usually have expertise in a spe- 
cific industry and therefore have difficulties evaluating cross-correlations between different 
industries. 

The problem of estimating cross-correlations is similar to knowing only Monte Carlo 
time series for the dynamics of spins and estimating the interactions between them from 
their correlations. In this physics problem, interactions are stationary in time and one can 
in principle calculate the exact correlation matrix by using infinitely long time series. In 
the stock market problem, correlations may not be stationary, and the use of long time 
series may not be possible. Estimating correlations from short time series is plagued by 
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considerable statistical error. 

Random matrix theory (RMT) allows one to separate noise and information in C by 
comparing the statistical properties of C to the properties of a random control R constructed 



from i.i.d. time series JTTJT2] . Agreement between C and R is a signature of noise, whereas 
deviations indicate meaningful information [|7|-|^,|T^-[T7| . Specifically, it was found that only 
the few eigenvectors with eigenvalues larger than the upper edge A + of the random part of C 
contain information about groups of correlated firms || and are useful for the construction 
of optimal portfolios P, |l5ip!7| . Here, we go considerably beyond the analysis in previous 
approaches. We (i) compare portfolios constructed with RMT methods to those constructed 
under the standard assumption that the only common influence on different stocks is the 
whole market and (ii) systematically study whether portfolios constructed with the RMT 
method have the lowest possible risk. 

We diagonalize C and rank-order its eigenvalues A& such that \k+i > \k- To filter from C 
the effects of the random part, we calculate the upper edge A + of the random part of C and 
find that Ag§9 is the smallest eigenvalue larger than A + . In order to keep only the part of C 
which contains information about correlated groups of companies, we construct a 'filtered' 
diagonal matrix A', whose elements are 

f 1 < i < 989 

Ki = (3) 
[ A; 989 < i < 1000. 

We obtain the filtered correlation matrix C by transforming A' to the basis of C. In addition, 
we set the diagonal elements to one as every time series is completely correlated with itself. 

We compare the proposed method to a method in which the cross-correlation matrix 
C" is calculated under the assumption that the only common influence on two stocks is 
the whole market, i.e. the one factor model ||. This assumption is wide spread as on the 
one hand it is known that the price of a market index as the S&P500 (comprising the 500 
largest US stocks) has big influence on the price of individual stocks. On the other hand, 
there have been many attempts to identify further factors influencing the price of groups 
of stocks but none of these models was found to have larger predictive power than the 



simple assumption that only the market index influences stock prices ||. If G M (t) denotes 
the return of the market index (we use the S&P500 index), then the return of stock i is 
Gi(t) = Ri + /3iG M (t) + 6i(t), where ei{t) are random variables describing the component of 
the return of stock i which is both independent of the market and independent of all other 
stocks, and $ describes the response of stock % to a price change of the market. The cross- 
correlation matrix C" has elements = fiifija'^/ ((JiUj), where the standard deviations of 
G M and ej are <tm and cr,, 

To compare the quality of the RMT forecast with that of the control, we analyze 30-min 
returns of N — 1000 largest US stocks for the year 1994 ||18|| . We partition the year 1994 
into two six-month periods A and B and use the first period to calculate the RMT forecast 
C and the one-factor model forecast C" for the empirical matrix C B in the second period. 
As can be seen from Eq.([2[) one needs the future returns and standard deviations as an 
input in addition to C in order to calculate a portfolio. In practice these quantities are 
estimated by specialists flj"0fl . We use instead the returns and volatilities actually realized 
in the second period PJIi5|. In this way, we probe only the effect of randomness in the 
correlations coefficients and our results are not influenced by uncertainties in returns and 
standard deviations. With this input we calculate optimal portfolios, i.e., the weights {m ; } 
of investment made into stock i for C A , C, and C". Given these weights, we calculate the 
risk for a given value of return. 

We use three different tests to evaluate the performance of the RMT method as regards 
reducing risk. First, we compare the predicted risk to the risk which would have been 
realized if someone had invested using the set of weights {mj}. We calculate this realized 
risk by using the empirical cross-correlation matrix C B in Eq. ([]]). In agreement with ||15 



we find that the empirical matrix C A is a very poor forecast for C B as the realized risk is 
170% higher than the predicted one (relative difference). For portfolios constructed with 
the RMT forecast C |L5] and with the standard forecast C" the relative difference between 
predicted and realized risk is only 22% and 33%, respectively. In addition to the higher 
accuracy in forecasting risk, the realized risk for both C and C" is considerably smaller than 



for the empirical matrix C A (Fig. ||). 

Next, we compare portfolios constructed with the standard forecast C" against portfolios 
constructed with the RMT forecast C. We find that for a return of 15% the realized risk for 
the "filtered" portfolios is 5% smaller than the realized risk for the "standard" portfolios. 
A similar reduction of risk is also apparent for other expected returns (Fig. |2[). Thus, the 
RMT method not only provides better estimates of future risks than the standard method, 
but also allows to calculate portfolios with a considerably reduced realized risk. 

Finally, we study whether the RMT method really suggests the optimal number of eigen- 
values which should be kept when constructing the cleaned cross-correlation matrix. We 
calculate a family of cross-correlation matrices C' p by keeping the largest p eigenvalues in 
the diagonal matrix A' instead of keeping 12 as in Eq.(|3]). In Fig. Q the realized risk for 15% 
return is plotted against the number p of eigenvalues. For a range of 4 < p < 25 the level of 
realized risk fluctuates around the risk for p = 12 (RMT suggestion) . Hence we conclude 
that the RMT method provides a good estimate for the forecast of future cross-correlations. 

Having found that the cleaned cross-correlation matrix C is indeed a good choice for 
portfolio optimization, we want to come back to the random magnet analogy and ask to 
what type of random magnet the portfolio problem corresponds. For an investment in the 
stock market as described by a linear constraint fixing the total invested capital Eq.(2), one 
cannot find a phase transition. Instead, the covariance matrix acts like a susceptibility and 
the amount of invested capital depends on the ratio of expected return to expected volatility 
of an eigenmode. Alternatively, one can study an investment in futures markets, where the 
investor is asked to leave a deposit proportional to the value of the asset. This leads to a 
nonlinear constraint J27=i l m «l instead of the magnetic field term in Eq.(2). Extrema of the 
free energy are described by coupled equations for the signs Si = sign [77^] 



Si = sign 



N 



(4) 



li=l 

where {J -1 )^ = Cij<Ji<jj. In Ref. this optimization problem was studied for a historical 
cross-correlation matrix and found to be related to spin glasses. Here, we argue that for 
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the cleaned matrix C one has to solve the problem of ferromagnetic clusters in a random 
magnetic field. To see the difference, we compare the eigenvectors of C and C. For each 
eigenvector, we are interested in the number N s of significant components which can be mea- 
sured by one over the inverse participation ratio (IPR) ||19|| . We analyze the eigenvectors 
of the matrices C^a^aj and C'^OiOy The number of significant components of the eigen- 
vectors of these matrices (which are also the eigenvectors of the inverse matrices used in 
Eq.(El)) is displayed in Fig. |j. Many of the eigenvectors of C A have more than 200 significant 
components and describe long range frustrated interactions giving rise to a spin glass type 
magnetic problem ||. On the other hand, all but one of the eigenvectors of C have less 
than 30 significant components. The eigenvector corresponding to the largest eigenvalue 
has 285 significant components and describes the influence of the whole market on the price 
dynamics of an individual stock. In terms of the magnetic model, it describes a long range 
ferromagnetic interaction. The 999 eigenvectors with a small number of significant com- 
ponent describe the fluctuations of individual stocks or ferromagnetic interaction of small 
clusters of stocks which can be identified as business sectors ||. Hence we suggest that 
the magnetic problem equivalent to the portfolio problem with a cleaned cross-correlation 
matrix is a random field ferromagnet. 

In summary, we used random matrix theory to estimate cross-correlations and find that 
this method allows us to find investments with substantially reduced risk compared to con- 
ventionally used methods. To accomplish this, we exploited a formal analogy with the 
"random magnet problem", and analyzed the cross-correlation matrix C of stock returns 
for short time intervals extending over a one-year period. We find an estimate for C that 
outperforms the standard estimate, and allows us to construct an investment which exposes 
the invested capital to only a minimum level of risk. 
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FIG. 1. Portfolio return R as a function of risk D for the families of optimal portfolios con- 
structed from (a) the original matrix C, (b) the filtered matrix C, and (c) the control C". The 
curves on the left show the predicted level of risk, whereas the curves on the right show the realized 
risk D calculated using the correlation matrix C B for the second half of 1994. The ratio of realized 
to predicted risk is smallest for the RMT method (b), followed by the control (c), and largest for 
the original matrix (a) 
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FIG. 2. Comparison of the realized risk for the family of portfolios constructed from C (RMT 
method) and C" (conventional method). For a given return, the RMT portfolios are characterized 
by a lower level of risk than the the conventional portfolios. 
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FIG. 3. Dependence of the realized risk on the number of eigenvalues kept in the calculation of 
the cleaned cross-correlation matrix C. For this plot, the level of realized return is chosen as 15% 
. RMT suggests that keeping 12 eigenvalues is the best choice for minimizing risk. 
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FIG. 4. The number N s of significant components of the eigenvectors of C A (circles) and C 
(triangles) is plotted against the rank of the eigenvector. N s is defined as one over the inverse 
participation ratio. Most of the eigenvectors of C have a large N s , whereas all but one of the 
eigenvectors of C have a small N s indicating individually fluctuating stocks or interactions between 
small clusters of stocks. The last eigenvector with N s = 285 describes the influence of the whole 
market and corresponds to a long range ferromagnetic interaction in the magnetic analogy. 
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